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Abstract 

We fit the parameters of a differential equations model describing 
the production of gap gene proteins Hunchback and Knirps along the 
antero-posterior axis of the embryo of Drosophila. As initial data for 
the differential equations model, we take the antero-posterior distri- 
bution of the proteins Bicoid, Hunchback and Tailless at the begin- 
ning of cleavage cycle 14. We calibrate and validate the model with 
experimental data using single- and multi-objective evolutionary op- 
timization techniques. In the multi-objective optimization technique, 
we compute the associated Pareto fronts. We analyze the cross regula- 
tion mechanism between the gap-genes protein pair Hunchback-Knirps 
and we show that the posterior distribution of Hunchback follow the 
experimental data if Hunchback is negatively regulated by the Hucke- 
bein protein. This approach enables to predict the posterior localiza- 
tion on the embryo of the protein Huckebein, and we validate with the 
experimental data the genetic regulatory network responsible for the 
antero-posterior distribution of the gap gene protein Hunchback. We 
discuss the importance of Pareto multi-objective optimization tech- 
niques in the calibration and validation of biological models. 

KEYWORDS: Genetic regulatory networks, Hunchback-Knirps cross 
regulation, Huckebein. 
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1 Introduction 



In the Drosophila egg, maternal mRNAs are placed near the poles of 
the oocyte by the mother's ovary cells, defining the antero-posterior 
axis of the embryo. Fertilization triggers the translation of these 
maternal mRNAs to proteins that regulate the expression of zygotic 
genes. Each of the zygotic genes is transcribed in certain regions of the 
embryo syncitial blastoderm, and the produced proteins act as tran- 
scription factors that regulate the expression of other zygotic genes. 

After fertilization, the first 13 nuclear divisions occur without the 
organization of cellular membranes, giving rise to a syncitial blasto- 
derm. The cytoplasmic membranes only become completely formed 
three hours after fertilization, in the interphase following the lA th mi- 
totic cycle, just before the onset of gastrulation. 

During the syncitial stage, the transcribed zygotic genes are di- 
vided in three main families: gap, pair-rule and segment polarity 
genes. The proteins resulting from their expression define broad seg- 
mentation patterns along the antero-posterior axis of the embryo. 
These segmentation patterns appear as protein gradients along the 



antero-posterior axis of the D r osoph i la emb r yo. IFrigerio et al 



1986: 



Driever and Nusslein-Volhardl . 1988 : Akam, 1987 : Niisslein-Volhard 



1992]. 



The proteins with origin in the maternal mRNAs form gradients 
along the antero-posterior axis of the embryo. In the beginning of 
cleavage cycle 14, proteins of maternal origin act as transcription fac- 
tors for gap-genes, pair-rule and segment polarity genes. 

There are several models aiming to describe proteins steady gra- 
dients in Drosophila early development. Some models are based on 
the hypothesi s of protein diffusion along the antero-posterior axis o f 



the embryo, Houchmandzadeh et al . 20051 : Alves and Dilao . 20061 ] 



and ot her models are based on the diffusion of m RNA of maternal 
origin, [Dilao and Murarol . l200fll : bilao et all 120091 ]. The protein dif- 
fusion hypothesis is sometimes justified by the absence of cellular 
membranes during the first 14 cleavage cycles of the embryo, and 
has been proposed by Nusslein-Volhar d and co-workers in the late 
eighties, Driever and Nusslein-Volhardl . 1988 ]. The mRNA diffusion 
hypothesi s is supported by th e recent observation of the mRNA Bicoid 



gradient, [Spirov et al 



2009 fl , and the a ssociated diffusion mechanism 



has been reported by [Cha et all 120011 ] that observed rapid saltatory 
movements in injected mRNA bicoid with dispersion but without lo- 
calization. ^AjwtheTjnatCTnal mRNA (nanos) has shown diffusive like 



Forrest and Gavis . 20031 ] . 



behavior, 

Here, we will be concerned with the calibration and validation of 
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the genetic regulatory network involving maternal proteins and the 
antero-posterior distribution of the gap genes Hunchback (HB) and 
Knirps (KNI) along the Drosophila embryo. One of the reasons for 
this study is that the regulation of the gradient of the HB protein in 
the posterior region o f the embryo of Drosophila is poorly understood, 



Margolis et~al\ [l995t | 



In order to calibrate the genetic regulatory network describing the 
production of the gap genes HB and KNI, we make some remarks on 
the biological assumption of our approach. 



1) 



Models assuming that proteins of maternal origin diffuse along the 
embryo need the additional assumption that these proteins are 

continuously produced and degraded. However thi s is unrealistic 

becau se: (i) Degradation has never been observed, [Kerszberg and Wolpertl . 



20071 ] ; (ii) There are no proteins in th e space around nuc l ei sug 



gesting that protein do not diffuse, Dilao and Murarol . 2009( | 
(iii) Protein diffusion models need a con dition on continuous pro 



duction of mRNA of matern al origin, [Houchmandzadeh et al. 



2005 : Alves and Dilaol . 20061 ] . a feature that has never been ob 



served. On the other hand, models based on mRNA diffusion do 
not show these unrealistic features, and are ab le to produce accu- 
rately gradients of protei ns of maternal origin, Dilao and Muraro 



20091 : iDilao et all l200fll ]. Anyway, the steady states obtained 



with the protein and the mRNA diffusion model s hav e the same 

functi onal form, (compare I Alves and Dilaol [20061 ] with lDilao and Murarol 
20091 ]). implying that the methodology followed here is not sen- 



sitive to the assumption about the diffusion model for proteins 
or mRNA of maternal origin. 

2) Threshold effects are important phenomen a for the establ ishment 
of positional information in the embryo. IWolpertl 19691 ] . It has 
been shown that the mass action law models and the associated 
conservation laws lead naturally to stable gradients along the em- 
bryo of Drosophila, without the need of ad-hoc threshold or diffu- 
sive effects at the level of gap gene expression. As a consequence, 
production models for gap gene proteins are simply described 
by ordinary differential equations models derived from the mass 
action law. Positional information is an emergent property of 
the mass action associated conservation laws. These biological 
assumptions have been teste d qu alitatively in | Alves and Dilao 
20051 ] . lAlves and Dilaol |2006l ] and bilao and Murarol [2009bl ]. 



3) The mechanism of protein production is described in two steps. 
In the first step, we describe the establishment of the steady gra- 
dients of proteins produced from mRNAs with maternal origin. 
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In the second step, we consider that the maternal proteins are 
transcription factors for the gap-gene proteins. In order to sim- 
plify the model equations and the number of parameters for the 
description of the gap gene protein production, we assume that 
maternal origin proteins are not consumed in the activation or 
repression of the gap gene proteins. In the case of the Hunchback 
protein, in a first step, we consider that the protein is produced 
from mRNA with maternal origin. In a second step, it is as- 
sumed that HB is zygotically produced. In the initial gap gene 
phase, the gap gene proteins other than Hunchback are assumed 
to have zero initial concentration. 

This paper is organized as follows. In section l2"7Tj we fit the experi- 
mental data of maternal proteins Bicoid (BCD), Hunchback (HB) and 
Tailless (TLL) with the equations for the steady state of a reaction- 
diffusion based model. The biological assumptions made are the ones 
described above in 1). The experim e ntal data were taken from the 



FlyEx database, Poustelnikova et all . I2004J : iPisarev et all 120091 ] , and 



the fits were obtained by an evolutionary search algorithm. In these 
fits, we reproduce accurately the experimental data for BCD, HB and 
TLL, and we determine along the antero-posterior axis of the embryo 
of Drosophila the initial localization of the mRNA of maternal origin. 

In section 12.21 we introduce the graph of the genetic network as- 
sociated with the production and the cross regulation of the gap gene 
proteins HB and Knirps (KNI) and we derive a mass action produc- 
tion model. Then, we describe the process of calibration of the pa- 
rameters of the model with the experimental data. The technique for 
parameter estimation is based on genetic algorithms with single- and 
multi-objective search techniques. As one of the main goals of this 
paper is to analyze the cross regulation of zygotically produced HB 
and KNI proteins, we have two objectives to fulfill. In this context, we 
find a continuous set of parameter solutions or Pareto front of the two- 
objectives optimization problem. This Pareto front corresponds to all 
possible admissible solutions of the bi-objective optimization problem. 
From the biological point of view, all the parameter solutions on the 
Pareto front are admissible and they correspond to different instances 
of the model parameters. All these Pareto solutions are very close to 
the experimental data and this has been evaluated by the chi-squared 
tests. 

In section [3l we describe the methodology of the multi-parameter 
fitting with evolutionary algorithms for one-objective and multi-objective 
optimization techniques. This section is essentially qualitative, de- 
scribing the geometry and structure of the algorithms. All the compu- 
tations are computationally involved and the programs are included in 
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the supplementary material to this paper, iDilao and Murarol |2009<j ] . 
Finally, in section [H we discuss the main conclusions of the paper. 



2 Results and Discussion 



2.1 Steady state models for the distribution 
of proteins with maternal origin 

The first stages of the establishment of the positional information for 
the cellular differentiation of the Drosophila embryo are determined 
by the initial distribution of maternal mRNAs and the corresponding 
produced proteins. Here, we consider three proteins whose gradients 
are established prior to the gap gene phase. These three proteins are 
Bicoid (BCD), Hunchback (HB) and Tailless (TLL). We fit the steady 



from the FlyEx database | 


iozlov et al.. 


2000; ] 


Vtvasnikova et al. 1999. 


2001; 


Poustelnikova et al\ 


20041; Pisarev et al. 


20091. 


http : / / flyex. ams . sunysb . edu/ flyex/| 



For the fits, we use a single-objective optimization technique for the 
distributions of BCD, HB and TLL. 

Hunchback and bicoid maternal mRNA are initially distributed 
along the antero-posterior axis of the embryo. The tailless gene is 
activated by the Torso (TOR) protein that has maternal origin. Here, 
we consider that TLL is produced directly from mRNA til which is 
not of maternal origin. This choice is a simplification in the model 
and the fit could be a lso obtained taking acc ount of the activation of 
the til gene by TOR, [Alves and D~iIaol . l2006t ] . 

To describe the steady states of BCD, HB and TLL, we assume a 
model for the production of proteins from the initial distribution of 
the associated mRNAs. In fact, we can adopt two alternative mod- 
els. In one model, the produced protein diffuses an d degrades along 



Alves and Dilao 



the e mbryo, leading to a gradient like steady state 
20061 ] . In a second alternative model, is the maternal mRNA that 
diffuses and degrades, leading to a gradient like steady state for the 
protein. The second model is exper imentally supporte d by the fact 



Spirov et al. 



20091 ] . It has been 



the bicoi d mRNA shows a g r adien t, 
shown in IDilao and Murarol [20091 ] that the protein steady states for 
both models have the same functional form, with parameters assum- 
ing different biological meanings. In the following, and without lack 
of generality, we assume the simple mRNA diffusion model for the 
production of proteins of maternal origin (Assumption 1) in §[1]). 

In order to arrive at the steady state functional forms for the dis- 
tribution of BCD, HB and TLL proteins along the antero-posterior 
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axis of the Drosophila embry o , we follow the mass action approach 
developed in lAlves and Dilaol [20051 ] and bilao and Murarol |2009bl ]. 
We consider the following kinetic diagrams for protein production, 



bed P ^ bed + BCD, 

hb PJ ^ hb + HB , 

Ptll 



bed 

dhb 



dbed 



til 



til + TLL . 



hb 
tll^h 



where capital letters represent proteins and the italic letters the corre- 
sponding mRNAs. The constants pbcd, Pub and ptll are the protein 
production rates from mRNAs, and dbed, dhb and d t u are mRNA degra- 
dation rates. By the mass action law, to the above kinetic diagrams 
correspond the equations for the concentrations, 



PBCDbcd(x) 



-7T- = -dhbhb + D hb - 



dbed 

~dT 
<9BCD 

dt 
dhb 

~di 
<9HB 

dtll 

~~dT 
dTLL 

dt 



-d bcd bcd(x) + D, 



bed" 



d 2 bcd 
dx 2 



d 2 hb 
dx 2 



PHBhb 



duitll + D 



til 



dHll 
dx 2 



PTLLtU 



(1) 

(2) 
(3) 
(4) 

(5) 
(6) 



This system of differential equations describe the production and dis- 
tribution of proteins and mRNA along the antero-posterior axis of the 
embryo of Drosophila. The antero-posterior axis is described by the 
independent coordinate x. The x-dependent diffusion terms do not 
follow from the mass action law, but they have been added in order to 
describe the diffusive motion of the mRNAs. The diffusion constants 
of the mRNAs are Dbed, Dhb and D t u. 

In order to solve this system of equations (HD-©, we now define 
boundary and initial conditions. Denoting by L the length of the em- 
bryo, we have that x £ [0, L\. Assuming zero flux boundary conditions 
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for mRNAs and proteins, we have, 



dbcd . 
dx 


= o,t) 


= 0, 


dbcd . 
dx 


L,i) = 


0, 


(7) 


9BCD 

dx {X 


= o,t) 


= 0, 


3BCD 

dx 


= L,t) 


= 


(8) 


dhb, 


= o,t) 


= 0, 


dhb, 
-dx~ {x = 


L,t) = 


0. 


(9) 


<9HB 


= 0,t) 


= 0, 


9HB 


= L,t) = 


= 


(10) 


dtll, 


= 0,t) 


= 0, 


dtll, 
-dx~ {x = 


L,t) = 


0. 


(11) 


<9TLL 

5x (X 


= 0,t) 


= 0, 


<9TLL 

0* (X 


= L,t) 


= 


(12) 



for every t > 0. As initial conditions, we take the piecewise constant 
functions, 



bcd(x, t = 0) 




(13) 



B > 0, if < Li < x < L 2 < L 
0, otherwise 

BCD(j;,t = 0) = 

H > 0, if < Mi < x < M 2 < L 

0, otherwise 

Ti > 0, if < iVi < x < N 2 < N 3 

T 2 > 0, if L 3 < x < L N < L 

0, otherwise 



for every x € [0, L\. The functions bcd(x,t = 0) and hb(x,t = 0) 
describe the distribution of bed and hb maternal mRNA in the regions 
[Li,L 2 ] and \M\,M.2\, respectively, of the antero-posterior axis of the 
embryo of Drosophila. The function tll(x,t = 0) is the distribution of 
the hypothetical til maternal mRNA in the region [iVi, JV2] U [A3, N4], 
and B, H, T\ and T 2 are constants. 

Equations ([I])-©, with boundary conditions ([7|)- (|12|) . and initial 
conditions (|13p define the mRNA diffusion model for BCD, HB and 

TLL production. This model is linear, and the steady states ~BCD e q {x), 

HB eg (x) and TLL e(? (x) can be obtained explicitly (for details see lDilao and Muraro 
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2 e 2a2-l C ° Sh ( G2 l)( Sillh ( G2 T) " Sinh (^T 



(14) 



2 e 2ajl_ 1 COSh ( a *l ) ( Sillh ( fl4 x) ~ Sillh (^17 

^ ^ e -04(«+Jltfi)/L _ e -a 4 ( X+ M 2 )/L^ + hh ( x) (15) 

2 e 2a 6 / a L 5 _ 1 cosh ( a ef ) ( (ae-^) - sinh (a 6 -^)) 

' .-atix+Nj/L _ e -a 6 (x+N 2 )/L\ + 



05 , 



2 e 2asTL-l C ° Sh ( G8 Z) ( Sillh ( G8 T) " Sillh ( G8 ir)) 

^^-a 8 (x+7V 3 )/L _ e -a 8(a;+ 7V 4 )/L^ + (lg) 



a ^ e -oa(£i-a!)/£ _ g-a 2 (L 2 -x)/L^ ^ if x < Li 

ai - «L ^ e -afl(a-Ia)/£ + e -a 2 (L 2 -z)/L^ j if Li < X < L 2 

a. ^ e -a 2 (x-L 2 )/L _ e -a 2 (x-Li)/L^ ? if X > L 2 

(17) 

ffiL ^ e -d4(Afi-x)/i _ e -a 4 (M 2 -x)/L^j ^ if X < M\ 

a 3 - M ^ e -tt4(*-Jltfi)/L + e -a 4 (M 2 -x)/L^j ^ ft Mi < X < M 2 

^ a4 ( x ~ M z)/ L — e —a4(x—Mi)/Lj if x > M-2 

(18) 

Ofi. ^ e -a e ,(N l -x)/L _ e -a 6 (N 2 -x)/L^ ^ if X < N\ 

ffi ^ e -a 6 (a;-JV2)/i _ g-a 6 (a;-./Vi)/£ j ^ if X > iV 2 

(19) 

^ ^ e -a 8 (Af 3 -x)/L _ e -a s (N 4 -x)/L\ ^ if X < N3 

°7 - ^(e-^ x - N ^/ L + e -«8(Af 4 -x)/L^ j if at 3 < x < AT 4 

92. ^ e -as(x-N4)/L _ e -a 8 (x-N 3 )/L^j ^ if X > N4 

(20) 
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and, 



Ol 


gPBCD 
dbcd 


a\ 


— "-bed n 

u bcd 


«3 


R PHB 
dhb 


a\ = 




«5 


T PTLL 

dtu 


n (> 


— <ku n 
L>m 


a 7 


T PTLL 

dm 




= dm „ 
i->m 



(21) 
(22) 
(23) 
(24) 

Note that a6 = «8- 

The steady states for the gradients of proteins BCD, HB and TLL 
are given by equations (|14p - (|24p . For the calibration of equations 
(|14|) - (|24|) with the experimental data, we have taken from the Fly Ex 
database the mean antero-posterior distributions of the proteins BCD, 
HB and TLL. These distributions have been calculated from the indi- 
vidual spatial distributions measured in 954 different embryos. These 
distributions are assumed to correspond to a steady state and, in the 
case of HB, the steady state is assumed to be established at the end 
of cleavage cycle 13. For the BCD and the TLL proteins, the steady 
state distribution corresponds to the beginning of cleavage cycle 14A. 
In Figs. [H [2] and [3l we show the mean values and the corresponding 
standard deviations of the gradients of proteins BCD, HB and TLL 
along the antero-posterior axis of the embryo of Drosophila. In these 
pictures, all the embryos have been scaled to the length L = 100. 

To fit the experimental data of BCD, HB and TLL with (|14D - 
(f24"|) . we have used an evolutionary search algorithm (see ^3. If) . and 
the choice of parameters has been done by minimizing the reduced 
chi-square functions, 

2 l^(BCD eg ( 

•^iiPx) BCD mean (xj)) 
Xbcd^Pi) ~ -A. BCD^fc) 

2 1 ^ (RB eq (xi,f 2 ) -RB mean (xi)) 2 
Xhb{P2) = iffi^) (25) 

2 / ^ \ 1 (TLL eg (xj, ps) — TLL mean (xj)) 2 
Xtll(Pb) ~ -X. TLL cr 2(x i ) 

where p~[ = (L±, L 2 , a%, 02) is the vector of the free parameters for the 
BCD production model, p" 2 = (Mi, M2, 03, 04) is the vector of the free 
parameters for the HB production model, and 

P3 = (N 1 ,N 2 ,N 3 ,N i ,a 5 ,a6,a 7 ,a 8 ) 
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is the vector of the free parameters for the TLL production model. 
The functions BCD mean (x), HB mean (x) and TLL mean (x) are the mean 
values of the protein concentrations along the antero-posterior axis 
of the embryo, and the functions BCD cr 2(x), HB CT 2(x) and TLL^x) 
are the associated standard deviations. In the fits, we have assumed 
that a§ and as are independent parameters and we have taken n = 
100. This assumption gives more plasticity to the data fitting and 
is based on the assumption that the goal of the fits is to find an 
accurate fitting function for TLL. The protein TLL is activated by the 
mater nal origin protein Tor so and this mechanism is not considered 
here, Alves and Dilao . 20061 ] . The results of the three calibrations are 
shown in Figs. [Q [2] and El and the fitted parameter values are listed 
in the figure captions. 




Figure 1: Dots and error bars represent the mean values and the standard 
deviations of the concentration of the protein Bicoid (BCD) along the antero- 
posterior axis of the embryo of Drosophila, at cleavage cycle 14A. The fit has 
been obtained with the steady state solution denned in CHI) . f|T7|) and f[2~Tl) . 
The parameter values found in the fit are: L\ = 0.00, L2 = 0.24, a\ = 186.83 
and (12 = 8.18. The reduced chi-squared value of this fit is Xbcd(p~i) = 0.13. 
The interval [L\ , L2] is the region where mRNA bed is deposited by the 
mother's ovary cells. 

From the fits in Figs. [H [2] and El we conclude that the steady state 
model describes well the distribution of proteins predicted from the 
mRNAs with maternal origin. The values of the reduced chi-squared 
test show that the agreement between data and fits are very good. 
If a model is successfully calibrated with experimental data, then it 
corresponds, with some degree of plausibility, to the mechanism that 
it pretends to describe. 
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Figure 2: Dots and error bars represent the mean values and the standard 
deviations of the concentration of the protein Hunchback (HB) along the 
antero-posterior axis of the embryo of Drosophila, at the end of cleavage 
cycle 13. The fit has been obtained with the steady state solution defined in 
(lT5"j) . (1T8T) and (1221) . The parameter values found in the fit are: M 1 = 0.04, 
M 2 = 0.47, a 3 = 85.09 and a 4 = 16.36. The reduced chi-squared value of 
this fit is Xhb{P2) = 0.02. The interval [Mx,M 2 ] is the region where mRNA 
hb is deposited by the mother's ovary cells. 



200 
150 
100 
50 


20 40 60 80 100 

Figure 3: Dots and error bars represent the mean values and the standard 
deviations of the concentration of the protein Tailless (TLL) along the antero- 
posterior axis of the embryo of Drosophila, at cleavage cycle 14A. The fit has 
been obtained with the steady state solution defined in (Tl6|) . (1T91 . (1201) . (J23]) 
and (1241) . The parameter values found in the fit are: N\ = 0.10, N 2 = 0.19, 
N 3 = 0.86, N 4 = 0.97, a 5 = 49.05, a 6 = 39.56, a 7 = 175.83 and a 8 = 30.33. 
The reduced chi-squared value of this fit is Xtll(p~3) = 0.03. 
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As already stated, the steady state solutions (fT4"|) -(fl~6 ]) are func- 
tionally similar to the ones obtai ned with the prot e in rea ction-diffusion 
mode l, compare equation (10) o f Alves and Dilao 200d ] with equation 
(4) of iDilao and Murarol [20091 ] . 

Programs and software tools for evolutionary algorithms optimiza- 
tion techniques and model c onstruction and analysis a re available in 
the supplementary material, IDilao and Murard 2009c ] . 

We are now in condition to make the calibration and validation of 
the gap gene proteins HB and KNI. 



2.2 Fitting the gap-genes 

To describe the production of gap gene proteins, we consider that 
BCD, HB and TLL proteins are in the steady state with a gradi- 
ent like distribution along the antero-posterior axis of the embryo of 
Drosophila, Figs. [Q [2] and [3J We consider that the production of the 
gap-genes proteins begins at the cleavage cycle 14 and, at this stage, 
we do not consider diffusion (Assumption 2) in §[]jj. We expect that 
the positional information is obtained by a t hreshold mechanism with 
diffusi on playing no role, Alves and Dilaol . 20051 ; Dilao and Muraro . 
2009bt So, to model the gap-gene transcriptional regulation of Hunch- 
back (HB) and Knirps (KNI), we take as initial conditions the antero- 
posterior distribution of BCD, HB and TLL, as found in the previous 
section. Then, we bu ild the regulatory netw ork following the mass 
action law strategy of Alves and Dilao 20051 ] and Dilao and Murarol 
2009bl ]. 



The basic pattern of gap-genes HB and KNI expression pattern 
is due to strong mutual repression between these genes. This com- 
plementarity is particularly clear in the experimental data for the 
couple HB-KNI at cleava ge cycle 14A-4, and has been confirmed in 
Jaeger and Reinita 20061 ] and earlier results, together with the repres- 
sion of TLL over KNI, affecting the posterior pole of the embryo. 

The gap-gene genetic regulatory network involving HB and KNI is 
displayed in Fig. 01 Associated with the regulatory network of Fig. [U 
we build the model for this genetic regulatory model based on the 
mass action law and following the description of transcriptiona l regu- 
lati on by the operon model a nd developed in Alves and Dilad 2005 ] 
and lDilao and Murarol |2009bl ]. Using the Mathematica software pack- 
age GeneticNetworks.m, we obtain the equations describing the time 
evolution of the gap gene protein concentrations. These differential 
equations involve the concentration of the proteins and of the gap 
genes with the different biding sites occupied or not. In the particular 
case of Fig. HI the full system of ordinary differential equations has 14 
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Figure 4: Genetic regulatory network graph associated with the cross regula- 
tion of the proteins Hunchback (HB) and Knirps (KNI) in Drosophila early 
development. The protein KNI is activated in the embryo by Bicoid (BCD). 
HB has a maternal origin and is also regulated by BCD. Both KNI and zy- 
gotically produced HB repress each other. In Fig. [21 we show the distribution 
of HB at the end of the maternal phase, before considering the regulation by 
BCD as described in this genetic network graph. 



equations and 23 free parameters ( Dilao and Murarol 2009c ]). 

In order to test the validity and completeness of the genetic reg- 
ulatory network in Fig. [4] we took from the FlyEx database the 
experimental data of the distribution of HB and KNI for the late 
cleavage cycle 14, and we have integrated numerically in Mathemat- 
ica the model equations generated by the GeneticNetworks.m soft- 
ware package. The free parameters on the model equations were 
determined with a bi-objective optimization technique ( 93.3|) . mini- 
mizing the mean squared deviations between the model solutions and 
the experimental data. Denoting by HB(x,t) and KNI(x,t) the solu- 
tions of the model equations, we have fitted the experimental data for 
the antero-posterior distribution of HB and KNI with the functions 
o/ l feHB(x, t) and Ofc n jKNI(x, t), where cthb and ctkni are proportionality 
constants. The introduction of the proportionality constants a^b and 
akni is due to the fact that experiments do not correspond to a direct 
measurement of local protein concentration, but it is proportional to 
protein concentration. These proportionality constants change from 
one protein to another. With these two additional proportionality 
constants and time as a free parameter, we have fitted the 23 param- 
eters of the model with a bi-objective optimization technique and we 
have calculated the associated Pareto front. 

In Fig. El we show this data and the corresponding fits. From the 
fitts, it is shown clearly that the genetic regulatory network of Fig. [4] 
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describes well the HB and the KNI distributions away from the pos- 
terior tip of the Drosophila embryo. Clearly, complementarity of the 
proteins HB and KNI in the middle region of the embryo is observed. 
This fact suggests that there are other genes that regulate the pos- 
terior region of the embryo. A plaus ible candidate is the Huckebein 
(HKB) protein. iMargolis et aJl |l995l |. 
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Figure 5: Dots and error bars represent the mean values and the standard 
deviations of the concentration of the proteins Hunchback (HB) and Knirps 
(KNI) along the antero-posterior axis of the embryo of Drosophila, at the 
end of cleavage cycle 14A. The fit has been obtained by a multi-objective 
optimization technique as described in § 13.31 The continuous lines correspond 
to the differential equation model solutions ahbHB(x,t*) and «fc n iKNI(x, £*), 
away from the steady state (t* < oo), and for a particular set of parameter 
values localized on the Pareto front of the bi-objective optimized solution. In 
this case, the fitted value of time is t* = 10 s, and the fitted proportionality 
constants have the values a^b = 0.1 and a^ni = 2.0. The penalized chi- 
squared values, fl27|) . of these fits are X#b(P4) — 0.28 and Xkni(p^) = 0.50, 
where p^ is the vector of the parameters that have been fitted. In this case, 
P = 23. In this model, this shows clearly that the genetic regulatory network 
of Fig. H] describes well the HB and KNI distribution away from the posterior 
tip of the embryo of Drosophila. 

In order to analyze the distribution of HB near the posterior region 
of the embryo, there is experimental evidence that Huckebein (HKB) 
protein has a band near the posterior pole of the embryo, repressing 
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the zygotic production of HB. Therefore, we introduce HKB in the 
gap gene regulatory network as in Fig. [6) As there is experimental 
evidence that HKB represses the production of HB near the posterior 
pole of the embryo, we assume a band type localization of HKB near 
the posterior tip of the embryo. 

bed hb 

I \ 

BCD— ►HB | HKB 




KNI| TLL 



Figure 6: Genetic regulatory network graph associated with the cross regu- 
lation of the proteins Hunchback (HB), Knirps (KNI) and Huckebein (HKB) 
in Drosophila early development. The transcription repres sion of HKB on 



the transcription of HB is described in iMargolis et al\ [1995 



By consistence with the model construction done in the previous 
section § 12.11 we assume that the HKB protein is localized with the 
following steady state distribution, 

BKB eq (x) = 2 — cosh (fo^) ( sinh (fo-jr) - sinh (b 2 ^j- 



e 2b 2 /L_i V'LJy \ L J V Z L 

+ h L-l»{x+I\)/L _ e - b2ix+ P 2 )/L\ + hcd{x) (26) 



where Pi, P 2 , b\ and b 2 are constants to be fitted and have the same 
meaning as the constants in the BCD equilibrium distribution pip . 
Under these conditions, we have introduced the HKB distribution into 
the previous model and we have repeated the bi-objective optimization 
analysis and we have calculated the associated Pareto front. In Fig. [7J 
we show one of the Pareto instances of the fit of the experimental 
data, as well as the fitted distribution of the protein HKB. 

The quality of the fits in Figs. [5] and [7J were evaluated from the 
penalized chi-square functions, 

1 ^ (HB eg (^,p)-HB 

mean ( x i)) 



Xhb(p) ~ HB ff2 (x 4 ) 

2 ^ 1 A (Km eq (x h p) - KNI„(^)) S 

XKNiiPl - ^p-f 2 2^ KNI~^) 



(27) 
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Figure 7: Dots and error bars represent the mean values and the standard 
deviations of the concentration of the proteins Hunchback (HB) and Knirps 
(KNI) along the antero-posterior axis of the embryo of Drosophila, at the end 
of cleavage cycle 14A. Due to the lack of experimental data on HKB its spa- 
tial experimental distribution is not represented. The fit has been obtained 
by a bi-objective optimization technique for HB and KNI, having also as free 
the parameters that describe the HKB distribution ([26]) . The continuous 
lines correspond to the differential equation model solutions ahbH.^(x, £*), 
CKfc n jKNI(x, £*) and HKB eg (x), for a particular set of parameter values local- 
ized on the Pareto front of the bi-objective optimized solution. In the case, 
the fitted value of time is t* = 29.1 s, and the fitted proportionality constants 
have the values = 0.11 and a^ni = 0.65. The penalized chi-squared value, 
(|27|) . of this fit are Xhb(p^) = 0-14 an d XknAps) = 0.59, where p^ is the 
vector of the parameters that have been fitted. In this case, P = 31. The 
parameter values found for the prediction of the HKB distribution (|26|) are: 
p x = 0.856, P 2 = 0.873, h = 296.74 and b 2 = 121.87. The HKB distribution 
found in the fit shows the existence of a stripe of the protein HKB near the 
posterior pole of the embryo as suggested experimentaly. With this fit, it is 
clearly shown that the genetic regulatory network of Fig. [6] describes well the 
HB and KNI distribution along all the antero-posterior axis of the embryo 
of Drosophila. 
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where p is the vector of the free parameters for the differential equation 
model and P is the dimension of the vector p. 

From the fits in Fig. we conclude that the transcriptional cross 
repression of HB over KNI and the transcriptional repression of HKB 
over HB describe well the spatial distributions of HB and KNI proteins 
along the antero-posterior axis of the embryo of Drosophila. This 
result also predicts the distribution of the protein HKB. 

Another important conclusion common to both fits is that gap gene 
protein expression is a dynamic process with a very fast expression 
time, of the order of 30 s (Fig. [7]). This expression time is calculated 
relative to the beginning of cleave stage 14A. 

Programs and software tools for multi-objective optimization tech- 
niques an d Pareto front solutions a re available in the supplementary 
material, Dilao and Murarol . 2009c ], 



3 Materials and Methods 



In this section, we briefly describe the algorithms that we have applied 
to calculate the parameters that best fit the experimental data to the 
model equations generated by the Mathematica software package Ge- 
neticNetworks.m. These algorithms are based on the Covariance Ma- 
trix Adaptation Evolution Strategy (CMA-ES) appro ach, an evolution- 

ary a l gorithm for bla ck-box continuous optimization, [Hansen and Qstermeierl . 



200 ll ; lHansenl . 120081 ] . The first algorithm is for single-objective opti- 
used in §2.11 and will be referred by CMA-ES. The sec- 



mization 

ond algorithm is the multi-objective version of CMA-ES, used in §2.2 
and uses several CMA-ES processes togethe r with a global Pareto- 



dominance based selection, [Igel et all 120071 ] . In a maximization or 



a minimization problem, there is a fitness function relative to which 
an optimization is found. In multi-objective optimization problems, 
there are several fitness functions, and in general when we optimize 
in order to a fitness function, we are worsening in order to the other 
fitness function. Pareto optimization is a way of obtaining optimal 
solutions that are not dominated in a certain sense by other solutions. 



3.1 Single-objective optimization: CMA-ES 

CMA-ES is an evolutionary algorithm that uses a population of \i 
parents to generate A offspring, and deterministically selects the best 
/i of those A offspring for the next generation. 

To have an idea of the parameter identification search problem, 
we take first a compact subset X of the parameter space S. The 
number of the parameter to be identified is the dimension of S. Set 
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an initial point po G X C S and let C = I n be a covariance matrix, 
where I n is the n x n identity matrix. Then, from the multivariate 
Gaussian distribution with covariance matrix C and mean value pq, 
sample A offsprings. For each offspring calculate a fitness function, in 
our case the, chi-squared distributions ([25]) . From the best n (< A) 
offsprings, according to the fitness function, recalculate a new mean 
value pq and a new (unbiased estimator) covariance matrix C, and 
repeat the procedure. After several iterations, the best individual ever 
found is a candidate for the be st ch oice of parame ters. For details see 
Hansen and Ostermeier 2001 ] and Hansen 20081 ] . Maternal protein 



distributions in Figs. [H [2] and [3] have been determined according to 
this technique. 



3.2 Pareto optimization 

Pareto optimization is concerned with the finding of the set of optimal 
trade-offs between conflicting objectives. Namely, Pareto solutions of 
a multi-objective problem are optimized solutions such that the value 
of one objective cannot be improved without degrading the value of at 
least another objective. Such best compromises are what is called the 
Pareto set of the multi-objective optimization problem. Pareto opti- 
mization is based on the notion of dominance. Consider a minimiza- 
tion problem with M real valued objective functions / = (/i, . . . , /m) 
defined on a subset X C ffi n . A solution of the optimization problem 
x G X is said to dominate another solution x G X, denoted by x -< x, 
if, 

Mm G {1,...,M}: 

(fm(x) < fm(x)) A (3m G {1, ... , M} : f m (x) < f m (x)) . 

The Pareto set of an optimization problem is the set of nondominated 
solutions of a minimization (maximization) problem. More formally, 

Pareto Set = {x : (x G X)A fix G X : x -< x} . 

The Pareto front is the image of the Pareto set in the fitness space, 

Pareto front = {f(x) : (x G X)A jBx G X : x -< x} . 

The goal of Pareto optimization is to find the Pareto set of opti- 
mized parameters and the Pareto front. Therefore, in a multi-objective 
approach, the natural choice for unbiased parameter estimation is the 
determination of the Pareto set of a given optimization problem. In 
this set, all the solutions are optimized solutions. The distributions 
of the gap gene proteins HB and KNI in Figs. [5] and [7] correspond 
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to parameter values on a Pareto set of the bi-objective optimization 
problem. I n general, al l the solutions on the Pareto set are equally 
acceptable, Dilao et al. , 20091 ] . 



3.3 Multi-objective optimization: MO-CMA- 
ES 

The Multi-Objective CMA-ES (MO-CMA-ES) optimization technique 
is based on the specific CMA-ES algorithm with a random choice 
of a large numb er of initial points in the search parameter space, 



Igel et a.Z.1 . 12Q07T ] . Once defined the multidimensional parameter search 
space, X, we proceed with the multi-objective optimization technique 
to determine the Pareto set and Pareto front of the two fitting prob- 
lems of § 12.21 The MO-CMA-ES techniques can be divided in three 
steps: 

1) In the compact search space X, choose randomly \x parents. For 

each parent, one offspring is generated with the CMA-ES algo- 
rithm. Initially, the CMA-ES algorithm is implemented with the 
identity as covariance matrix. 

2) We now rank the best fi individuals from the set of 2/j, individ- 

uals found previously. For that we use the concept of Pareto 
dominance. From the 2/j, individuals, we select the set of all 
the non-dominated individuals and we give them rank 1. We 
apply the same procedure to the remain i ng in dividuals and we 
obtain the rank 2 individuals, Deb et al. , 20021 ] . This procedure 



continues until a last rank is reached. 

3) In order to rank the individuals within the same rank of non- 
dominance found previously, we do a second ranking of indi- 
viduals within each rank. This second order ranking is done 
according to an hype rvolume measure in the objective space, 



Knowles et all 120031 ] . After this new ranking, we retain only 
the best individuals. With this procedure, we obtain an ap- 
proximation to the Pareto front with an approximately uniform 
distribution of individuals within each rank. Then, we repeat 
these three procedures until a good converge to the Pareto front 
is achieved. 

In Fig. [HI we show the Pareto front for the bi-objective optimization 
problem associated with the parameter identification describing the 
distribution of HB and KNI as shown in Fig. We show the position 
of the fit of Fig. [5] in the Pareto front of Fig. [HJ In Fig. [fU we show two 
other instances of the fits of HB and KNI proteins on the Pareto front. 
Comparing the three fits, we conclude that they are all acceptable. 



19 



In Fig.llOt we show the Pareto front for the fit of Fig.[7jand we mark 
the particular instance of the parameters of Fig. [7J In all the cases 
shown here, we conclude that the experimental data are optimally 
realized by an infinite set of parameters. This is particularly important 
in biology in the case of selection pressure affecting simultaneously 
several phenotypic characteristics of organisms. 



o 



"0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 

Xhb 

Figure 8: Pareto front for the fit of HB and KNI proteins of Fig. [5j In this bi- 
objective optimization problem, the coordinates of the fitness space are the 
reduced chi-squared functions Xhb(p) an< ^ Xkni(.P)i where the vector of the 
parameters p is a parameterization of the Pareto front. These functions have 
been calculated as in (12"5T) . The cross represents the particular instance of 
the parameter values of Fig. [5j The circles represent the two other instances 
of the HB and KNI fits that are shown in Fig. [9j 
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4 Conclusions and Final Remarks 

In order to analyze the expression of the gap gene protein Hunch- 
back along the antero-posterior axis of the embryo of Drosophila, we 
have introduced a genetic regulatory network model for the proteins 
HB and KNI and we have calibrated the experimental data with the 
model predictions. In the most complete version of the model of 
Fig. [6l we have shown that the distribution of HB and KNI along 
the antero-posterior axis of the embryo are in fact well described by a 
cross regulation mechanism together with the transcriptional repres- 
sion of HKB over HB. We have predicted the distribution of HKB in 
the form of a localized stripe near the posterior tip of the embryo. 
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Figure 9: Two instances of the fit of HB and KNI in the Pareto front, repre- 
sented by circles in Fig. [SJ In a) , we have the best fit for HB and the worst 
fit for KNI. IN b) we have the worst fit HB and the best fit for KNI. 




0.565 
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Figure 10: Pareto front for the fit of Hunchback, Knirps and Huckebein pro- 
teins of Fig. [3 The cross represents the particular instance of the parameter 
values of Fig. [71 
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Another important conclusion we have obtained is that these patterns 
are obtained as transient solutions of an ordinary differential equation 
model, with diffusion playing no role at the level of gap gene protein 
expression patterns. With this approach, diffusion is only relevant 
for the establishment of gradients for proteins produced from mRNA 
with maternal origin. The patterning obtained along the embryo re- 
sults from the differences in concentrations of the maternal proteins 
of the embryo. 

The calibration and validation of the genetic regulatory network 
models have been done with genetic algorithm techniques for param- 
eter identification. We have used single-objective and multi-objective 
techniques within the genetic algorithms formalism, and we have ana- 
lyzed the usefulness of the concept of Pareto optimization in biology. 
Due to similarities between the fits and the experimental data, it is 
plausible to think that, in the presence of several objectives, the num- 
ber of possible parametric solutions of a given problem is not unique, 
producing an infinite set of parameter instantiations. In this frame- 
work, the Pareto set and the Pareto front are the correct approach to 
analyze these problems. In the case of selection pressure on organisms 
affecting simultaneously several phenotypic characteristics, the Pareto 
type solutions appear as the right quantitative approach to quantify 
phenotypic variability. 

In the most difficult case of the multi-objective optimization prob- 
lem analyzed here, we have fitted 31 parameters in a system of or- 
dinary differential equations with 18 independent variables, and we 
have implemented these algorithms in a grid computing environment. 
In the supplementary material of this paper, we list all the algo- 
rithms and all the associa ted C files developed under this framework, 
Dilao and Murarol 2009c ]. These techniques are general and can be 



used in other parameter identification problems. 
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